Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs

نویسندگان

  • Ichitaro Yamazaki
  • Stanimire Tomov
  • Tingxing Dong
  • Jack J. Dongarra
چکیده

The Generalized Minimum Residual (GMRES) method is a popular Krylov subspace projection method for solving a nonsymmetric linear system of equations. On modern computers, communication is becoming increasingly expensive compared to arithmetic operations, and a communication-avoiding variant (CA-GMRES) may improve the performance of GMRES. To further enhance the performance of CAGMRES, in this paper, we propose two techniques, focusing on the two main computational kernels of CA-GMRES, tall-skinny QR (TSQR) and matrix powers kernel (MPK). First, to improve the numerical stability of TSQR based on the Cholesky QR (CholQR) factorization, we use higherprecision arithmetic at carefully-selected steps of the factorization. In particular, our mixed-precision CholQR takes the input matrix in the standard 64-bit double precision, but accumulates some of its intermediate results in a software-emulated double-double precision. Compared with the standard CholQR, this mixed-precision CholQR requires about 8.5× more computation but a much smaller increase in communication. Since the computation is becoming less expensive compared to the communication on a newer computer, the relative overhead of the mixedprecision CholQR is decreasing. Our case studies on a GPU demonstrate that using higher-precision arithmetic for this small but critical segment of the algorithm can improve not only the overall numerical stability of CA-GMRES but also, in some cases, its performance. We then study an adaptive scheme to dynamically adjust the step size of MPK based on the static inputs and the performance measurements gathered during the first restart loop of CA-GMRES. Since the optimal step size of MPK is often much smaller than that of the orthogonalization kernel, the overall performance of CA-GMRES can be improved using different step sizes for these two kernels. Our performance results on multiple GPUs show that our adaptive scheme can choose a near optimal step size for MPK, reducing the total solution time of CA-GMRES.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for CA-GMRES on GPUs

We propose a mixed-precision orthogonalization scheme that takes the input matrix in a standard 64-bit floating-point precision, but accumulates its intermediate results in the doubled-precision. When the target hardware does not support the desired higher precision, we use software emulation. Compared with the standard orthogonalization scheme, we require about 8.5× more computation but a much...

متن کامل

Mixed-precision orthogonalization scheme and its case studies with CA-GMRES on a GPU

We propose a mixed-precision orthogonalization scheme that takes the input matrix in a standard 32 or 64-bit floating-point precision, but uses higher-precision arithmetics to accumulate its intermediate results. For the 64-bit precision, our scheme uses software emulation for the higher-precision arithmetics, and requires about 20× more computation but about the same amount of communication as...

متن کامل

Breakthroughs in Sparse Solvers for GPUs

The CUDA Center of Excellence (CCOE) at UTK targets the development of innovative algorithms and technologies to tackle challenges in Heterogeneous High Performance Computing. Over the last year, the CCOE at UTK developed CUDA-based breakthrough technologies in sparse solvers for GPUs. Here, we describe the main ones – a sparse iterative solvers package, a communication-avoiding (CA) sparse ite...

متن کامل

A new adaptive GMRES algorithm for achieving high accuracy

GMRES(k) is widely used for solving nonsymmetric linear systems. However, it is inadequate either when it converges only for k close to the problem size or when numerical error in the modified Gram–Schmidt process used in the GMRES orthogonalization phase dramatically affects the algorithm performance. An adaptive version of GMRES(k) which tunes the restart value k based on criteria estimating ...

متن کامل

The Effect of Measurement Errors on the Performance of Variable Sample Size and Sampling Interval Control Chart

The effect of measurement errors on adaptive and non-adaptive control charts has been occasionally considered by researchers throughout the years. However, that effect on the variable sample size and sampling interval (VSSI) control charts has not so far been investigated. In this paper, we evaluate the effect of measurement errors on the VSSI control charts. After a model development, the effe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014